Significant Phrases Detection
نویسندگان
چکیده
The problem of determining key words and phases which best characterize a text document has important applications such as building a compact index for a largescale text processing system, or using a keyword set for summarization and topic detection. We approached this problem from two perspectives. Our knowledgepoor approach is based on statistical collocation detection using the t-test and likelihood ratio, and applying latent semantic analysis to identify terms important in a particular document. The knowledgerich approach addresses the problem using noun phrase chunking and coreference resolution. Both approaches use a decision tree classifier to answer whether a given phrase is a key word looking at the set of calculated features. We have built prototypes and compared results of these two approaches.
منابع مشابه
تعیین مرز و نوع عبارات نحوی در متون فارسی
Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...
متن کاملIdentifying Evolutionary Topic Temporal Patterns Based on Bursty Phrase Clustering
We discuss a temporal text mining task on finding evolutionary patterns of topics from a collection of article revisions. To reveal the evolution of topics, we propose a novel method for finding key phrases that are bursty and significant in terms of revision histories. Then we show a time series clustering method to group phrases that have similar burst histories, where additions and deletions...
متن کاملTowards Sentiment Analysis of Financial Texts in Croatian
The paper presents results of an experiment dealing with sentiment analysis of Croatian text from the domain of finance. The goal of the experiment was to design a system model for automatic detection of general sentiment and polarity phrases in these texts. We have assembled a document collection from web sources writing on the financial market in Croatia and manually annotated articles from a...
متن کاملA Unified Probabilistic Approach for Semantic Clustering of Relational Phrases
The task of finding synonymous relational phrases is important in natural language understanding problems such as question answering and paraphrase detection. While this task has been addressed by many previous systems, each of these existing approaches is limited either in expressivity or in scalability. To address this challenge, we present a large-scale statistical relational method for clus...
متن کاملParaphrase Detection Using Recursive Autoencoder
In this paper, we tackle the paraphrase detection task. We present a novel recursive autoencoder architecture that learns representations of phrases in an unsupervised way. Using these representations, we are able to extract features for classification algorithms that allow us to outperform many results from previous works.
متن کامل